Time evolution of link length distribution in PRL collaboration network 
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An important aspect of a Euclidean network is its link length distribution, studied in a few real 
networks so far. We compute the distribution of the link lengths between collaborators whose papers 
appear in the Physical Review Letters (PRL) in several years within a range of four decades. The 
distribution is non-monotonic; there is a peak at nearest neighbour distances followed by a sharp 
fall and a subsequent rise at larger distances. The behaviour of the statistical properties of the 
distribution with time indicates that collaborations might become distance independent in about 
thirty to forty years. 
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Ever since the discovery of small world effect in a vari- 
ety of networks [1] , study of real world networks and their 
theoretical modelling have generated tremendous activ- 
ity. A network is equivalent to a graph and is charac- 
terised by the links which connect pairs of nodes. Based 
on observations and theoretical arguments, it has been 
established that factors like preferential attachment, du- 
plication, aging etc. are responsible in determining the 
connectivity in many real world networks [2] . 

In a Euclidean network, where the nodes are embed- 
ded on a Euclidean space, it can be expected that the 
distance between nodes will play an important role in 
determining whether a link will connect them. In several 
theoretical models of Euclidean network, the link length 
distribution has been assumed to have a power law decay 

Linking schemes in a few real world networks in which 
geographical distance plays an important have been stud- 
ied. These are the internet [4], transport [5], neural net- 
work [6] and some collaboration networks [7-10]. In this 
article, we report the study of a network of collaborators 
whose papers appear in Physical Review Letters. We 
also study this distribution at different times as it is a 
dynamic network and reflects the evolution of both com- 
munication and human interactions. 

Scientific collaboration network is a social network 
[11,12] in which close personal encounters are essential 
to a large extent and it is expected that the existence of 
links between authors will depend on the distance sepa- 
rating them. Communication is the key factor in a col- 
laboration and it has undergone revolutionary changes 
over the years. This effect will manifest in the time evo- 
lution of the link length distribution. We have therefore 
studied its behaviour over four decades. 

To obtain the link length distribution, one should take 
the collaboration network and calculate the geographical 
distances separating the host institutes of the authors 
who share a link. However, this becomes a formidable 
task. We have obtained the distance distribution in an 
indirect way. Noting that the collaboration acts are the 
papers, the distance between the co-authors in a particu- 



lar paper would also supply the necessary data. We have 
therefore taken sample papers (at least 200 for each year) 
from the Physical Review Letters (PRL) and calculated 
the geographical distance between each pair of authors in 
a coarse grained manner for nine different years between 
1965 to 2005 and obtained the link-length distributions. 

The pair-wise distances I gives the distribution P{1) of 
the distance between two collaborating authors. We have 
also defined a distance factor d for each paper where d 
is the average of the pair-wise distances of authors coau- 
thoring that paper. The corresponding distribution Q{d) 
has also been computed. For example, let there be a pa- 
per authored by three scientists and let I12, Z13, 123 be the 
pairwise distances. Then d = (/i2+'i3+^23)/3. Note that 
in P(l), the fact that ^12,^13 and I23 are obtained from 
a single collaboration act is missing. Hence, in a sense, 
Q{d) takes care of the correlation between the distances. 
Let us call Q{d) the correlated distance distribution. 

In principle, the actual geographical distances have to 
be computed which is non-trivial. We have coarse grained 
the distances in a convenient way. To author X in a pa- 
per we associate the indices xi,X2,X3 and X4 (x^'s are 
integers) which represent the University/Institute, city, 
country and continent of X respectively. Similar indices 
yi,y2, ys and t/4 are defined for author Y. If, for example, 
authors X and Y belong to the same institute, Xi — yi = 1 
for all i. On the other hand, if they are from differ- 
ent countries but the from same continent, X4 — but 
Xi yi ioi i < 4. We find out for what maximum value 
of fc, Xk ^ yk- The distance between X and Y is then 
IxY = fc + 1. If a;i = yi for all values of i it means 
IxY — 1 according to our definition. As an example, 
one may consider the paper PRL 64 2870 (1990), which 
features 4 authors. Here authors 1 and 2 are from the 
same institute in Calcutta, India, and are assigned the 
variables 1, 1, 1, 1. The 3rd author belongs to a differ- 
ent institute in Calcutta and therefore gets the indices 
2, 1, 1, 1. The last author is from an institute in Bom- 
bay, India, and is assigned the variables 3, 2, 1, 1. Hence 
^12 — 1, ^13 — ^23 = 2, li4 = I24 = ^34 = 3 and the average 
d = 2.333. Defining the distances in this way, the values 
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of I are discrete while the d values have a continuous vari- 
ation. For papers with two authors, the two distributions 
are identical but will be different in general. 
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FIG. 1. Distance distribution P{1) as function of distance 
/ for different years. 
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FIG. 2. Correlated distance distribution Q{d) vs distance 
d plot for different years are shown. 

We have made exception for USA authors since it is 
a big country comparable in size to Europe which con- 
sists of many countries. Thus two authors belonging to, 
say, Kentucky and Maryland will have different country 
indices, i.e., ^ yz- 

Some papers like the experimental high energy physics 
ones typically involve many authors and many institutes. 
We have considered an upper bound, equal to 20, to the 
number of institutes and no bounds for the number of 
authors. In case of multiple addresses, only the first one 
has been considered. 

Both the distributions P{X) and Q((S) have the follow- 
ing features: 

1. A peak at / or d = 1 

2. A sharp fall at around I ot d = 1 and a subsequent 
rise. The fall becomes less steep in time. 

3. Even for the most recent data, the peak at nearest 
neighbour distances is quite dominant. However, with 
the passage of time, the peak value at nearest neighbour 



distances shrinks while the probability at larger distances 
increases. 

In Figs. I and 2, the distributions P(/) and Q(d) are 
shown. The two distributions have similar features but 
differ in magnitude, more so in recent years, when the 
number of authors is significantly different from two in 
many papers. The data for Q(d) apparently has an oscil- 
latory nature for larger values of d. However, we believe 
that these oscillations are due to the coarse graining of 
the data and it is more likely that the peak at the near- 
est neighbour distances is followed by a crest and a gentle 
hump at larger distances. The hump grows in size with 
time while the peak value at nearest neighbour distances 
diminishes. 
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FIG. 3. The mean value and standard deviation of dis- 
tances d increase with time while the roughness of the dis- 
tance distribution Q(d) shows a steady decrease. 

We make a detailed analysis of (5((i), the correlated 
distance distribution. In Fig. 3, we present the results. 
The mean increases appreciably in consistency with our 
idea that with the progress of time there will be more col- 
laborations involving people working at a distance. The 
fluctuation also shows an increase, although its increase 
is not that remarkable since the total range of interac- 
tion remains fixed in our convention. If collaborations 
were really distance independent, the distributions (5((i) 
and P{X) would have looked flat. We have estimated the 
deviation of from a flat distribution by calculating 
its "roughness" Rq defined as \J {{Q{d) — Q{d)Y) where 
Q{d) is the mean value of Q{d). Rq shows a decrease 
with time which is approximately linear. 

The above results imply that even with the communi- 
cation revolution, most collaborations take place among 
nearest geographical neighbours. The drop near d = 2 
maybe justified from the fact that in most cities one has 
only one university/institute and when one collaborates 
with an outsider, she or he belongs to some other city 
or country in most cases. There is some indication that 
in the not too distant future collaborations will become 
almost distance independent as in Fig. 3, Rq seems to 
vanish at around 2040 when extrapolated. It may also 
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happen that Rq saturates to a finite value in the com- 
ing years, and perhaps it is too early to predict anything 
definite. 

What is the nature of the distribution when the real 
distances are considered? We notice that there is a sharp 
decrease of Q{d) with d initially which may be assumed 
to be exponential in nature. The way wc have defined 
I (or d), it maybe assumed that the true distances dreal 
scale roughly as exp{ad'^) where a is a number of the 
order of unity. In that case, the initial exponential de- 
cay of Q{d) with d corresponds to a power law decrease 
with dreal- The subsequent rise of the distribution with 
d should also show up against dreai- 

In summary, we have studied the link length distribu- 
tions in the Euclidean network of collaborators of PRL 
papers. Unlike the other features of a network, e.g., 
degree distribution or aging, we do not find a conven- 
tional power law or exponential decay but rather a non- 
monotonic behaviour. The data over different times 
shows that the communication revolution has indeed in- 
fluenced long distance collaborations to a considerable 
extent 
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